Atomic Read-Modify-Write Operations are Unnecessary for Shared-Memory Work Stealing

نویسندگان

  • Umut Acar
  • Arthur Charguéraud
  • Stefan Muller
  • Mike Rainey
  • Umut A. Acar
چکیده

We present a work-stealing algorithm for total-store memory architectures, such as Intel’s X86, that does not rely on atomic readmodify-write instructions such as compare-and-swap. In our algorithm, processors communicate solely by reading from and writing (non-atomically) into weakly consistent memory. We also show that join resolution, an important problem in scheduling parallel programs, can also be solved without using atomic read-modifywrite instructions. At a high level, our work-stealing algorithm closely resembles traditional work-stealing algorithms, but certain details are more complex. Instead of relying on atomic read-modify-write operations, our algorithm uses a steal protocol that enables processors to perform load balancing by using only two memory cells per processor. The steal protocol permits data races but guarantees correctness by using a time-stamping technique. Proving the correctness of our algorithms is made challenging by weakly consistent shared-memory that permits processors to observe sequentially inconsistent views. We therefore carefully specify our algorithms and prove them correct by considering a costed refinement of the X86TSO model, a precise characterization of total-store-order architectures. We show that our algorithms are practical by implementing them as part of a C++ library and performing an experimental evaluation. Our results show that our work-stealing algorithm is competitive with the state-of-the-art implementations even on current architectures where atomic read-modify-write instructions are cheap. Our join resolution algorithm incurs a relatively small overhead compared to an efficient algorithm that uses atomic readmodify-write instructions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Greedy Sharing: Load Balancing on Weakly Consistent Memory

An efficient online scheduler is crucial for balancing irregular parallel computations in a multiprocessor system. Over the last two decades, variants of the work-stealing scheduler have emerged as a popular choice for hardware shared-memory systems. The state-of-the-art work-stealing algorithms can guarantee near-optimal asymptotic complexity by relying on simple yet powerful techniques to bal...

متن کامل

Local Read-Write Operations in Sensor Networks

Designing protocols and formulating convenient programming units of abstraction for sensor networks is challenging due to communication errors and platform constraints. This paper investigates properties and implementation reliability for a local read-write abstraction. Local read-write is inspired by the class of read-modify-write operations defined for shared-memory multiprocessor architectur...

متن کامل

Fast Mutual Exclusion Algorithms Using Read-Modify-Write and Atomic Read/Write Registers

Three fast mutual exclusion algorithms using read-modify-write and atomic read/write registers are presented in a sequence, with an improvement from one to the next. The last algorithm is shown to be optimal in minimizing the number of remote memory accesses required in a resource busy period. Remote memory access is the key factor of memory access bottleneck in large shared-memory multiprocess...

متن کامل

A Tight Bound on Time Complexity of Mutual Exclusion

In distributed shared memory multiprocessors, remote memory accesses generate processor-tomemory traffic which may result in a bottleneck. It is therefore important to design algorithms that minimize the number of remote memory accesses. We establish a lower bound of 3 on remote access time complexity for mutual exclusion algorithms in a model where processes communicate by means of a general r...

متن کامل

Beyond Atomic Registers: Bounded Wait-Free Implementations of Nontrivial Objects

We de ne a class of operations called pseudo read-modify-write (PRMW) operations, and show that nontrivial shared data objects with such operations can be implemented in a bounded, wait-free manner from atomic registers. A PRMW operation is similar to a \true" read-modify-write (RMW) operation in that it modi es the value of a shared variable based upon the original value of that variable. Howe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015